Latest Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th) | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9th | 10th | 11th | 12th |
Chapter 3 Organisation Of Data
After collecting raw data, the next essential step in statistical analysis is to organize or classify it. Just as a junk dealer sorts items to manage their trade efficiently or you arrange books by subject for easier access, classifying data brings order to unorganized information. This organization makes the data easier to understand, compare, and analyze using statistical methods. Classification involves arranging data into groups or classes based on specific criteria, ensuring that similar characteristics are placed together.
Introduction
This chapter follows the data collection process by explaining how to classify collected data. Classification is crucial for organizing raw data and making it suitable for statistical analysis. Like sorting junk or arranging books by subject, classifying data brings order, making it easier to manage and retrieve information.
Raw Data
Raw data refers to unclassified or unorganized data. Like the junk dealer's unorganized collection, raw data can be large, cumbersome, and difficult to handle. Extracting meaningful conclusions or insights from raw data is challenging because it is not readily amenable to systematic statistical analysis. Therefore, organizing and classifying raw data is a necessary step after collection and before undertaking any systematic analysis. Tables 3.1 and 3.2 provide examples of raw data, showing unarranged numbers.
Table 3.1 Marks in Mathematics Obtained by 100 Students in an Examination:
47 | 45 | 10 | 60 | 51 | 56 | 66 | 100 | 49 | 40 |
60 | 59 | 56 | 55 | 62 | 48 | 59 | 55 | 51 | 41 |
42 | 69 | 64 | 66 | 50 | 59 | 57 | 65 | 62 | 50 |
64 | 30 | 37 | 75 | 17 | 56 | 20 | 14 | 55 | 90 |
62 | 51 | 55 | 14 | 25 | 34 | 90 | 49 | 56 | 54 |
70 | 47 | 49 | 82 | 40 | 82 | 60 | 85 | 65 | 66 |
49 | 44 | 64 | 69 | 70 | 48 | 12 | 28 | 55 | 65 |
49 | 40 | 25 | 41 | 71 | 80 | 0 | 56 | 14 | 22 |
66 | 53 | 46 | 70 | 43 | 61 | 59 | 12 | 30 | 35 |
45 | 44 | 57 | 76 | 82 | 39 | 32 | 14 | 90 | 25 |
Table 3.2 Monthly Household Expenditure (in Rupees) on Food of 50 Households:
1904 | 1559 | 3473 | 1735 | 2760 |
2041 | 1612 | 1753 | 1855 | 4439 |
5090 | 1085 | 1823 | 2346 | 1523 |
1211 | 1360 | 1110 | 2152 | 1183 |
1218 | 1315 | 1105 | 2628 | 2712 |
4248 | 1812 | 1264 | 1183 | 1171 |
1007 | 1180 | 1953 | 1137 | 2048 |
2025 | 1583 | 1324 | 2621 | 3676 |
1397 | 1832 | 1962 | 2177 | 2575 |
1293 | 1365 | 1146 | 3222 | 1396 |
Raw data is summarized and made comprehensible through classification. Grouping similar facts together simplifies locating information, making comparisons, and drawing inferences. For example, Census data, initially vast and fragmented, becomes understandable when classified by characteristics like gender, education, and occupation.
Classification Of Data
Data classification involves arranging or organizing data into groups or classes based on certain criteria. The method of classification depends on the purpose of the analysis. Data can be classified in several ways:
- Chronological Classification: Data is grouped according to time, either in ascending or descending order (e.g., population over different years - Example 1). A variable classified chronologically is a **Time Series**.
- Spatial Classification: Data is classified based on geographical location (e.g., yield of wheat in different countries - Example 2).
- Qualitative Classification: Data is classified based on qualitative characteristics or attributes that cannot be measured numerically (e.g., gender, literacy, religion). This can involve classifying based on the presence or absence of an attribute or further subdividing classes based on other attributes (Example 3).
- Quantitative Classification: Data for characteristics that are quantitative (can be measured numerically, like height, weight, age, income, marks) are grouped into classes (Example 4, Frequency Distribution of Marks).
Example 1: Population of India (in crores) - Chronological Classification
Year | Population (Crores) |
---|---|
1951 | 35.7 |
1961 | 43.8 |
1971 | 54.6 |
1981 | 68.4 |
1991 | 81.8 |
2001 | 102.7 |
2011 | 121.0 |
Example 2: Yield of Wheat for Different Countries (2013) - Spatial Classification
Country | Yield of wheat (kg/hectare) |
---|---|
Canada | 3594 |
China | 5055 |
France | 7254 |
Germany | 7998 |
India | 3154 |
Pakistan | 2787 |
Example 3: Qualitative Classification of Population by Gender and Marital Status
Population | |||
---|---|---|---|
Male | Female | ||
Married | Unmarried | Married | Unmarried |
Example 4: Frequency Distribution of Marks in Mathematics of 100 Students - Quantitative Classification
Marks | Frequency |
---|---|
0–10 | 1 |
10–20 | 8 |
20–30 | 6 |
30–40 | 7 |
40–50 | 21 |
50–60 | 23 |
60–70 | 19 |
70–80 | 6 |
80–90 | 5 |
90–100 | 4 |
Total | 100 |
Variables: Continuous And Discrete
Variables can be classified as continuous or discrete based on the values they can take.
- Continuous Variable: Can take any numerical value within a range (integral, fractional, or irrational). Its value changes smoothly without jumps (e.g., height, weight, time, distance).
- Discrete Variable: Can take only certain values and changes in finite jumps. It cannot take any intermediate value between two specific values (e.g., number of students in a class, number of cars on the road - typically whole numbers). A discrete variable can take fractional values if defined, but it cannot take values between consecutive defined fractional values.
What Is A Frequency Distribution?
A frequency distribution is a way to classify raw data of a quantitative variable. It shows how frequently different values or ranges of values of a variable occur within specific classes.
In a frequency distribution table (like Example 4):
- Classes: Ranges into which the data is grouped (e.g., 0–10, 10–20, etc.).
- Class Frequency: The number of observations that fall within a particular class (e.g., 7 students scored between 30 and 40 marks).
- Class Limits: The boundary values of a class. The lower value is the **Lower Class Limit**, and the higher value is the **Upper Class Limit**. (e.g., for class 60–70, the lower limit is 60, upper limit is 70).
- Class Interval or Class Width: The difference between the upper and lower class limits (e.g., for 60–70, the interval is 10).
- Class Mid-Point or Class Mark: The middle value of a class, calculated as (Upper Limit + Lower Limit)/2. This value represents the entire class in further statistical calculations.
Table 3.3: The Lower Class Limits, the Upper Class Limits and the Class Mark:
Class | Frequency | Lower Class Limit | Upper Class Limit | Class Mark |
---|---|---|---|---|
0–10 | 1 | 0 | 10 | 5 |
10–20 | 8 | 10 | 20 | 15 |
20–30 | 6 | 20 | 30 | 25 |
30–40 | 7 | 30 | 40 | 35 |
40–50 | 21 | 40 | 50 | 45 |
50–60 | 23 | 50 | 60 | 55 |
60–70 | 19 | 60 | 70 | 65 |
70–80 | 6 | 70 | 80 | 75 |
80–90 | 5 | 80 | 90 | 85 |
90–100 | 4 | 90 | 100 | 95 |
Total | 100 |
A **Frequency Curve** is a graphical representation of a frequency distribution, plotting class marks on the X-axis and frequencies on the Y-axis (Fig. 3.1).
How To Prepare A Frequency Distribution?
Constructing a frequency distribution involves addressing several decisions:
Should We Have Equal Or Unequal Sized Class Intervals?
Unequal intervals are used when the data range is very wide (e.g., income from near zero to very high values) or when observations are highly concentrated in a small part of the range. Otherwise, equal-sized intervals are preferred.
How Many Classes Should We Have?
The number of classes is usually between 6 and 15. With equal intervals, it can be estimated by dividing the range (difference between largest and smallest values) by the class interval size.
What Should Be The Size Of Each Class?
This depends on the number of classes and the data range, as these are interlinked decisions. In Example 4, with a range of 100 and 10 classes, the equal class interval is 10.
How Should We Determine The Class Limits?
Class limits should be clear and definite, ideally avoiding open-ended classes. They should be set so that observations within a class are concentrated around the class midpoint. Class intervals can be Inclusive (limits included in the class) or Exclusive (one limit, usually upper, excluded). Exclusive intervals are common for continuous variables to maintain continuity.
Example of Inclusive Class Intervals (Discrete Variable, Marks 0-100):
- 0-10 (includes 0 and 10)
- 11-20 (includes 11 and 20)
Example of Exclusive Class Intervals (Discrete Variable, Marks 0-100):
- 0-10 (includes 0, excludes 10)
- 10-20 (includes 10, excludes 20)
Example of Inclusive Class Intervals (Continuous Variable, Weight):
- 30 Kg - 39.999... Kg (includes 30, up to but not including 40)
For continuous variables represented using the inclusive method, an adjustment is needed to create continuity between classes.
Adjustment In Class Interval
To adjust inclusive class intervals for continuous data (e.g., Table 3.4), find the gap between the upper limit of a class and the lower limit of the next class, divide by two, subtract from all lower limits, and add to all upper limits. This creates adjusted class limits (e.g., Table 3.5) and adjusted class marks.
Table 3.4: Frequency Distribution of Incomes of 550 Employees of a Company (Inclusive)
Income (Rs) | Number of Employees |
---|---|
800–899 | 50 |
900–999 | 100 |
1000–1099 | 200 |
1100–1199 | 150 |
1200–1299 | 40 |
1300–1399 | 10 |
Total | 550 |
Table 3.5: Frequency Distribution of Incomes of 550 Employees of a Company (Adjusted/Exclusive)
Income (Rs) | Number of Employees |
---|---|
799.5–899.5 | 50 |
899.5–999.5 | 100 |
999.5–1099.5 | 200 |
1099.5–1199.5 | 150 |
1199.5–1299.5 | 40 |
1299.5–1399.5 | 10 |
Total | 550 |
How Should We Get The Frequency For Each Class?
Frequency of an observation is how many times it appears. Class frequency is the number of observations in a class. This is determined by **tally marks**.
Finding Class Frequency by Tally Marking
A tally (/) is marked for each observation falling into a class. Tallies are grouped in fives for easier counting (//// then ). The total number of tallies in a class is its frequency (Table 3.6).
Table 3.6: Tally Marking of Marks of 100 Students in Mathematics:
Class | Observations | Tally Marks | Frequency | Class Mark |
---|---|---|---|---|
0–10 | 0 | / | 1 | 5 |
10–20 | 10, 14, 17, 12, 14, 12, 14, 14 | //// /// | 8 | 15 |
20–30 | 25, 25, 20, 22, 25, 28 | //// / | 6 | 25 |
30–40 | 30, 37, 34, 39, 32, 30, 35, | //// // | 7 | 35 |
40–50 | 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, 49, 46, 41, 40, 43, 48, 48, 49, 49, 40, 41 | //// //// //// //// / | 21 | 45 |
50–60 | 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, 59, 56, 59, 57, 59, 55, 56, 51, 55, 56, 55, 50, 54 | //// //// //// //// /// | 23 | 55 |
60–70 | 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, 62, 61, 66, 60, 65, 62, 65, 66, 65 | //// //// //// //// | 19 | 65 |
70–80 | 70, 75, 70, 76, 70, 71 | ///// | 6 | 75 |
80–90 | 82, 82, 82, 80, 85 | //// | 5 | 85 |
90–100 | 90, 100, 90, 90 | //// | 4 | 95 |
Total | 100 |
Loss Of Information
Classifying data into a frequency distribution involves a loss of detailed information. Once data is grouped, individual observation values are lost, and only the class frequency and class mark are used in further calculations. While this summarizes data, it means less detailed information is available compared to raw data.
Frequency Distribution With Unequal Classes
Frequency distributions can have unequal class intervals, especially when data is concentrated in certain ranges. This allows for more representative class marks in those ranges (e.g., splitting wider classes into narrower ones where data is dense - Table 3.7).
Table 3.7: Frequency Distribution of Unequal Classes:
Class | Observations | Frequency | Class Mark |
---|---|---|---|
0–10 | 0 | 1 | 5 |
10–20 | 10, 14, 17, 12, 14, 12, 14, 14 | 8 | 15 |
20–30 | 25, 25, 20, 22, 25, 28 | 6 | 25 |
30–40 | 30, 37, 34, 39, 32, 30, 35, | 7 | 35 |
40–45 | 42, 44, 40, 44, 41, 40, 43, 40, 41 | 9 | 42.5 |
45–50 | 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 | 12 | 47.5 |
50–55 | 51, 53, 51, 50, 51, 50, 54 | 7 | 52.5 |
55–60 | 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55, 56, 55, 56, 55 | 16 | 57.5 |
60–65 | 60, 64, 62, 64, 64, 60, 62, 61, 60, 62, | 10 | 62.5 |
65–70 | 66, 69, 66, 69, 66, 65, 65, 66, 65 | 9 | 67.5 |
70–80 | 70, 75, 70, 76, 70, 71 | 6 | 75 |
80–90 | 82, 82, 82, 80, 85 | 5 | 85 |
90–100 | 90, 100, 90, 90 | 4 | 95 |
Total | 100 |
Frequency Array
For a discrete variable, the classification of its data is a **Frequency Array**. It shows the frequency for each distinct value the variable takes (Table 3.8).
Table 3.8: Frequency Array of the Size of Households:
Size of the Household | Number of Households |
---|---|
1 | 5 |
2 | 15 |
3 | 25 |
4 | 35 |
5 | 10 |
6 | 5 |
7 | 3 |
8 | 2 |
Total | 100 |
Bivariate Frequency Distribution
When data is collected for two variables from each unit of a sample (bivariate data), it can be summarized in a **Bivariate Frequency Distribution**. This distribution shows the frequency of observations for combinations of classes of the two variables (Table 3.9).
Table 3.9: Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure (in Thousand Rs) of 20 Firms:
Advertisement Expenditure (Thousand Rs) | Sales (Lakh Rs) | Total | |||||
---|---|---|---|---|---|---|---|
115–125 | 125–135 | 135–145 | 145–155 | 155–165 | 165–175 | ||
62–64 | 2 | 1 | 3 | ||||
64–66 | 1 | 3 | 4 | ||||
66–68 | 1 | 1 | 2 | 1 | 5 | ||
68–70 | 2 | 2 | 4 | ||||
70–72 | 1 | 1 | 1 | 1 | 4 | ||
Total | 4 | 5 | 6 | 3 | 1 | 1 | 20 |
Conclusion
Raw data collected from various sources needs to be classified for effective statistical analysis. Classification organizes the data, making it ordered and manageable. A frequency distribution is a comprehensive method for classifying quantitative data, showing how values are distributed across classes with their frequencies. Understanding techniques like forming classes (equal/unequal intervals, number of classes, limits), adjusting for continuity, and tally marking is crucial for constructing frequency distributions. While classifying data involves some loss of detail, the gain in making the data comprehensible for analysis outweighs this. Frequency arrays are used for discrete variables, and bivariate frequency distributions summarize data for two variables simultaneously.
Recap:
- Classification brings order to raw data.
- A Frequency Distribution shows variable values distribution across classes with frequencies.
- Exclusive Method excludes one class limit; Inclusive Method includes both limits.
- Statistical calculations in classified data use class midpoints.
- Classes should be formed so class marks represent observations well.
- Classification involves some loss of information.
- A Frequency Array is for discrete variables; a Bivariate Frequency Distribution is for two variables.